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Abstract 

Two key traits of 5G cellular networks are much higher base station (BS) densities - especially in the case 
of low-power BSs - and the use of massive MIMO at these BSs. This paper explores how massive MIMO can be 
used to jointly maximize the offloading gains and minimize the interference challenges arising from adding small 
cells. We consider two interference management approaches: joint transmission (JT) with local precoding, where 
users are served simultaneously by multiple BSs without requiring channel state information exchanges among 
cooperating BSs, and resource blanking, where some macro BS resources are left blank to reduce the interference 
in the small cell downlink. A key advantage offered by massive MIMO is channel hardening, which enables to 
predict instantaneous rates a priori. This allows us to develop a unified framework, where resource allocation is 
cast as a network utility maximization (NUM) problem, and to demonstrate large gains in cell-edge rates based 
on the NUM solution. We propose an efficient dual subgradient based algorithm, which converges towards the 
NUM solution. A scheduling scheme is also proposed to approach the NUM solution. Simulations illustrate more 
than 2x rate gain for 10th percentile users vs. an optimal association without interference management. 

I. Introduction 

Smart densification & heterogeneity in base station (BS) deployments and massive MIMO are con¬ 
sidered as two of the most important technologies in 5G cellular systems [1-31.^ The massive MIMO 
regime is the setting when the number of antennas at a BS is significantly larger than the number 
of users that are simultaneously served by the BS [4-61. In this paper, various aspects such as user 
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'As higher-frequency spectrum being available, large arrays become practical even for small cells. For example, at 3.5GHz band, a 
36-antenna (arranged on a square grid at half-wavelength separation) can be implemented on a 26cm x 26cm surface. The required 
implementation area would be smaller as the carrier frequency becomes higher. 
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association, load balancing, scheduling and interference management are considered for future networks 
with massive MIMO deployments where these technologies are adapted together. 

A. Motivation and Related Work 

Conventionally, mobile user equipments (UEs) are served by the BS providing the largest signal-to- 
interference-plus-noise ratio (SINR) or the largest received power [71 - called max-SINR association 
in this paper. In heterogeneous networks (HetNets), however, different types of BSs can have large 
disparities in transmit power, so a max-SINR association results in heavily congested macrocell BSs 
and lightly loaded low-power small cell BSs. This results in a very inefficient use of available time- 
frequency resources, and strongly motivates load balancing, which in effect means pushing some UE 
traffic onto lightly loaded small cells even if it requires reducing their SINRs by many dBs [81. 

Load Balancing. Several approaches have been used to study load balancing in HetNets, including 
stochastic geometry [9,101, game theory [111 and system-level simulations [12,131. Meanwhile, in 
industry, proactive load balancing is accomplished by biasing UE association towards the small cells 
[12,131.^ Our initial study on load balancing [141 formulated a network utility maximization (NUM) 
problem for user association in HetNets with single-antenna BSs, where resources are equally allocated 
among users in the same cell. The equal resource allocation can be suboptimal if the user associations 
happen on a much slower time scale than the channel variations. In general, the user association and 
scheduling (resource allocation) problems are coupled, and it is quite difficult to jointly optimize them. 

Massive MIMO. A key beneht of massive MIMO is that the extra diversity afforded by the large 
antenna array averages out the fast fading, and thus the instantaneous rate stabilizes to the long-term 
mean which changes on slow time scales. As shown in [151, the instantaneous rates can be predicted with 
peak-rate proxies, which are independent of scheduled instances. This property allows the decoupling of 
user association and scheduling, which is exploited to achieve near-optimal load balancing in massive 
MIMO HetNets with cellular transmission (where data for each user is transmitted from a single BS) [151. 

Coordinated multi-point transmission. MIMO techniques also provide the option of serving a user 
at high rates from multiple BSs - referred to as coordinated multi-point transmission (CoMP), which 
is proposed as one of the core features in LTE-Advanced [16-181. The set of BSs that cooperatively 
serve the same user is referred in this paper as a BS cluster. Paper [191 studies how to determine the BS 
clusters, while [20-261 investigate jointly optimized designs involving some of the following aspects: BS 

^Biasing refers to artificially adding a bias value (e.g., 10 dB) to received signal power from small cell layer at UEs. 
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cluster selection, beamforming (e.g., coordinated beamforming and joint transmissions), user scheduling 
and power allocation. In studies such as [23-261, the complexity of proposed algorithms can become 
prohibitive as number of antennas increases. On the other hand, an efficient suboptimal precoder for 
the single-cell scenario with large antenna arrays is proposed in [231. 

Resource Blanking. As the macrocells that users are offioaded from now become strong interferers 
for these offioaded users, the increased interference eats into the gains offered by load balancing. This 
motivates us to jointly consider user association and interference management. Besides CoMP, another 
popular interference management approach is to leave some macro resource blocks (RBs) blank, similar 
to enhanced intercell interference coordination (elCIC) in 3GPP [271. The key difference between RB 
blanking in our work and elCIC is that elCIC focuses on the time domain, while in this work blanking 
is applied in both time and frequency domains. We call the RBs where macro BSs are muted the blank 
RBs, while the rest of RBs are called normal RBs. Several works have considered the joint problem 
of user association and RB blanking. For example, [28,291 proposes a dynamic approach adapting the 
muting duty cycle to load variations, while [30-341 consider a more static approach. 

For general multi-cell massive MIMO HetNets, the joint optimization of user association and inter¬ 
ference management including joint transmission and resource blanking is still an open issue. In this 
work, we combine various aspects of resource allocation and interference management including resource 
blanking, joint transmission, association, user scheduling, etc. for massive MIMO deployments. We focus 
on a distributed-MIMO form of CoMP, which allows local precoding at each BS and does not require 
channel state information (CSI) exchanges among cooperating BSs [351. We call this specific form of 
CoMP as Local Joint Transmission (LJT). LJT allows us to develop a systematic resource allocation 
approach for CoMP (including cellular transmission as a special case). Other interference management 
approaches (e.g. [361) can also be adopted at the cost of additional complexity and overheads (e.g., 
schemes with joint precoding as discussed in [181), hut such designs are beyond the scope of this paper. 

Cross-layer Optimization. To study the joint user association and interference management problem, 
we propose to use the cross-layer optimization approach, aiming to improve the rate distribution, 
particularly the cell-edge performance. Cross-layer optimization is quite popular in the study of resource 
allocation (see, e.g., [37,381 and references therein). Among these, besides studies with disjoint clusters 
(i.e., each BS belongs to at most one cluster on any RB), e.g. [221, user-specific clusters with overlapping 
BSs have also been considered [39,401. At any scheduling instant, the cluster formation method we 
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consider can be described using “Dynamic Cooperation Clusters (DCC)” concept of [39]. An important 
difference between works following the DCC scheme [39] and our work resides in the selection of which 
BSs serve which users (which also inherently specify active clusters at an RB). For the former case, 
this selection is based on instantaneous channel gains between users and BSs as in [40]. In particular, 
the resource allocation in [39,40] (consisting of as precoding design, scheduling and power allocation) 
is done at each RB to optimize, for example, instantaneous user rates for some utility function. On 
the other hand, in our setting massive MIMO rate hardening allows us to allocate resources over many 
RBs to optimize a utility function over long-term user rates. Thus, the BS cluster selection for each 
user is determined as a result of load balancing and resource allocation across many RBs, which are 
performed ahead of time at a much coarser time scale than an individual-RB time scale. We then present 
scheduling policies at a finer time scale (i.e., RB level) to approach the optimized (coarser time scale) 
resource allocation.^ 

B. Contributions and Organization 

In this paper, we present a novel framework for the joint optimization of user association and 
interference management in massive MIMO HetNets, resulting in the following main contributions. 

A unified NUM problem. By exploiting the predictable instantaneous rate, user association and 
scheduling problems can be decoupled, allowing us in Sec. IV to formulate a unified convex optimization 
problem for resource allocation with both LIT and RB blanking. Note that in the considered LIT, the 
clusters are user-specific (i.e., different users can be served by different clusters). The formulated NUM 
problem can also be applied to scenarios where some bandwidth resources are explicitly reserved for 
macro or small cell operation, while some resources are reserved for being shared by both layers. As an 
extension of [15], the optimal solutions can always be realized by a suitably designed scheduler when 
blanking is used in cellular transmission. On the other hand, with LIT, we show that there exist some 
solutions that are not implementable. Naturally, the solution of the NUM problem - called the NUM 
solution - upper bounds the network performance and can serve as a useful benchmark. 

Dual subgradient based algorithm. Sec. V presents an efficient algorithm based on the dual sub¬ 
gradient method, which converges towards the optimal dual variables. As the objective function is not 
strictly convex, it is difficult to get the optimal primal variables given optimal dual variables. Exploring 
the solution structure, we formulate a small-size linear program (LP) to get the optimal primal variables. 

^In this paper similar to [29], we also consider resource allocation over two different time scales. [29] exploits the sparsity of the 
interference graph of the HetNet topology to overcome the complex coupling between user scheduling and RB blanking. 
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Simple scheduling scheme to approach the NUM solution. Note that the NUM solution provides 
the desirable resource allocation at a coarser time scale. In Sec. VI, we further present scheduling 
policies at a finer time scale (i.e. RB level), which target approaching the optimized resource allocation 
in the long term. 

Simulations"^ in Sec. VII show that the proposed harmonized CoMP/cellular operation can provide 
significant gains with respect to cellular-only massive MIMO operation [151, especially for cell-edge 
users. For example, the rate of bottom (10th percentile) users in our setup is about 2.2 x with respect 
to the optimal user association without interference management, which itself is much larger than the 
max-SINR association. Also, the utility provided by the proposed scheduling scheme is within 90% of 
the utility provided by the NUM solution. 

For convenience, the key notation in this paper is summarized in Table I-B. 

II. System Model 

In this paper, we focus on best-effort traffic and consider downlink (DL) transmission in a HetNet 
with J BSs and K single-antenna users. We let j G = {1, 2,..., J} and /cgW = {1,2,...,A} denoted 
the indices of BSs and users, respectively. Without loss of generality, we focus on two-tier HetNets 
comprising a macro layer and a small-cell layer, such as the one considered in Fig. 1. Letting and 
he the set of macro and small cell BSs, respectively, the BSs belonging to and B^ can differ in 
terms of transmit power, size, density, and number of antennas [Ifi The number of antennas at BS j is 
denoted by Mj with Mj ^ 1. We assume time division duplex (TDD) operation with reciprocity-based 
CSI acquisition [4,421. Hence, each user sends a single uplink (UL) pilot to train multiple nearby BSs. 
In contrast to feedback-based CSI acquisition, this enables the training of large antenna arrays with 
overhead proportional to the number of simultaneously served users. Moreover, it enables CoMP with 
practical CSI acquisition overheads. We also assume a block-fading channel model where the channel 
coefficients remain constant within each RB [4,5,42,431. 

With massive MIMO, a subset of users are scheduled (or, in other words, are active) on each RB. 
Various transmission schemes in terms of BS-cluster options are possible with this setup as shown 
in Fig. 1. At one extreme is cellular transmission shown in Fig. 1(a), where the coded data for any 
given active user is transmitted from a single BS and each BS manages interference only to the users 

"^Some of these results are also published in [41]. 
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Notation 

Description 

Bm, B,, B, U 

set of macro BSs, small cell BSs, all BSs, UEs, respectively 

Mj 

number of antennas at BS j 

Pj 

transmit power of BS j 

G, 

channel matrix between BS j and its users 

gfcj 

channel vector between BS j and UE k 

9kj,ii 

channel, fast fading between zth antenna of BS j and UE k, respectively 

Pkj 

slow fading between BS j and UE k 

ffcj 

precoder of BS j for UE k 


AWGN, variance of AWGN, respectively 

L 

cluster size 

Sj 

number of users that can be simultaneously served by BS j in cellular transmission 

SAL) 

number of users that can be simultaneously served by BS j in cluster of size L 

A 

operation option 


set of active BSs in operation A 

T (^) 

-^max 

maximal cluster size in operation A 

c 

cluster index 

Uc 

set of users served by cluster C 

yk{t) 

received signal at UE k on RB t 

Sk 

transmitted signal for UE k 

J.A) 

' kC 

instantaneous rate of user k from cluster C in Band-A in massive MIMO regime 

'^kC 

activity fraction of UE k from cluster C in Band-A 

Rk 

long-term rate of UE k 

9a 

fraction of RBs allocated to Band-A 

^AL 

fraction of RBs allocated to size-L clusters in Band-A 

'^kC 

approximate activity fraction by unique association 

C*{k) 

the cluster that serves UE k in the unique association 

Gfc 

the desired fraction of resources for UE k in the considered band 

Rk 

assumed rate for the VQ scheduling, where Rk = l/ak 

A V 

^max5 ^ 

sufficiently large parameters in VQ scheduling scheme 

Qk{t),Ak{t) 

VQ length, arrival rate of UE /c in VQ scheduling scheme, respectively 

P 

a tunable parameter to characterize how many users the BSs can schedule simultaneously 



(a) Cellular (b) Local Joint Transmission (c) Global Joint Transmission 

Fig. 1. Various MIMO transmission schemes. Color of the beam serving a user indicates where the CSI used for precoding is obtained. 
For both cellular and local joint transmission, beams are the same color with the BS they are emitted from because in these cases precoding 
is done only using locally obtained CSI. On the other hand with global joint transmission beams are precoded jointly using CSI from all 
BSs hence the beams are gray colored. 
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it serves. This corresponds to having \B\ many disjoint clusters of size 1. Fig. 1(c) shows the other 
extreme, Global Joint Transmission (ideal/full CoMP) where all BSs serve and coordinate interference 
to all active users in the network. In this case there is a single cluster serving all of the users. Other 
transmission schemes with cluster options in between these two extremes are also possible. DCC [391 
can be used to describe any of these transmission schemes. Fig. 1(b) shows the scheme considered in 
this work where BS-clusters are user-specific. According to this scheme, different active users can be 
served simultaneously by different potentially overlapping clusters. 

CoMP has some well-known advantageous over cellular transmission: 

(1) Performance gain at the cell edge: The beamforming (BF) gain becomes intra-cluster BF gain 
in CoMP, as the same coded data is transmitted from all BSs serving the user. Similarly, the intra-cell 
interference mitigation is extended across the cluster of BSs by which the user is served. As a result, 
the performance gain can be realized at the cell edge. 

(2) Low training overhead: A single UL pilot from a user can be received at all nearby BS antennas, 
whether these are in the same or different locations. Thus, the CSI acquisition between a user and nearby 
BSs need not incur additional overheads with respect to cellular transmission in the TDD system. The 
total number of users that can served by the system depends on the number of available UL pilots, 
and it should be the same regardless of cellular or CoMP if the UL pilots in cellular transmission and 
CoMP are the same. 

A major disadvantage of the general CoMP [18,44,451 is that it incurs additional overhead compared 
to cellular transmission for CSI exchanges among cooperating BSs. To overcome this challenge, we 
focus on schemes that perform local precoding at each BS: The user beams (i.e., precoding vectors) at 
any RB for any BS j are designed as if BS j was in cellular multi-user (MU)-MIMO transmission over 
all the users it serves. All BSs serving a user transmit the same coded data stream to the user [351. 
Each BS transmits the stream on a beam that is (independently) designed for the users at that BS. For 
instance, the beam with Linear Zero forcing Beamforming (LZFBF) for each user served by BS j is 
chosen within the null space of the channels of all the other users served by BS j, no matter whether 
there are additional BSs serving the user on the same RB or not. Due to local precoding, BS j only 
needs CSI between the users it serves and its own antennas to generate the user beams. Hence, the 
challenge of costly CSI exchanges between BSs is eliminated. 



In contrast to full CoMP^, with local precoding of the form depicted in Fig. 1(b), the instantaneous 
rate is independent of the other active users’ channels and only depends on the power allocation for 
user streams served by the BS. By allocating BS power equally across the set of active users^ and fixing 
the scheduling set sizes for BSs belong to common BS clusters, the instantaneous rate can be predicted 
a priori and independently of the other active-user set, thereby substantially reducing the complexity 
of the resource allocation problem. 

In this work, we restrict our attention to a small set of predefined possible scheduling set sizes, 
while how to select these sizes depend on many factors (e.g., number of antennas at the BSs, network 
deployment, etc.) and we leave the study to future work. Due to the fact that an uplink user-pilot 
trains all antennas at nearby BSs, we assume BSs that serving users at larger-size clusters (during a 
given scheduling slot) can schedule a larger number of users in the same slot than BSs serving users 
at smaller-size clusters or with cellular transmission. We thus let the size of scheduling set of any BS 
j depend on the size of the cluster including BS j. To avoid ambiguity on the scheduling set size of 
different BSs in the same BS cluster, we enforce the following condition: All users served by a given 
BS j are served in clusters of the same size on a given RB. This constraint ensures that overlapping 
clusters on any RB are of the same size. Let Sj denote the number of users that can be simultaneously 
served by BS j in the cellular transmission. We summarized the properties of the specific form of LIT 
considered in this work in the following definition: 

Definition 1. Admissible Local Joint Transmission Schemes (ALJTSs): An ALJTS schedules users 
for transmission on a sequence of RBs, and satisfies the following on each RB: 

(1) All users served by a given BS j are served in BS clusters of the same size L, for some L > 1; 

(2) BS j in BS clusters of size L serves at most Sj{L) users with Sj < Sj{L) < LSj and Mj ^ Sj{Ly ; 

^With general CoMP schemes (e.g., cluster ZFBF transmission), the instantaneous rate that is provided to a user by a cluster of BSs 
is also a function of the identity (and in fact the large-scale channel coefficients) of the set of the other users scheduled for cluster 
transmission with the user, and can thus vary from slot to slot depending on the scheduling set [45]. As a result, in contrast to the LJT 
schemes we consider (whereby a user’s instantaneous rate is independent of the identity of the active users in the slot), general CoMP 
schemes are not amenable to the load balancing and scheduling techniques presented in this work. 

^Power allocation is a thoroughly studied topic in the context of MIMO in general (see, e.g. [46]). With large antenna arrays, massive 
MIMO systems are able to get substantially better SINRs even without considering any power allocation optimization. For example, 
[45] considers equal power allocation while Marzetta in his pioneering work [4] allocates power to users proportional to their channel 
gains, which is in the reverse direction of typical power allocation in the context of a fairness criterion (whereby more power is typically 
allocated at the cell-edge). Following this trend in massive MIMO and considering the high complexity of power allocation optimization, 
we consider equal active user-stream power allocation at each BS. This approach simplifies the parametrization of peak-rate calculations 
in Section III and yields the convex NUM formulation in Section IV. 

^In this work, the 5j ’s and S'j(L)’s are assumed to be predefined parameters. Their impact on performance (and their choice) in practice 
depends on many factors (e.g., available UL pilot dimensions per RB for training in the network, spatial pilot reuse, network deployment, 
etc.). 



9 


(3) The user beams (i.e., precoding vectors) at BS j are designed as if BS j were performing cellular 
multi-user (MU)-MIMO transmission over (at most) Sj{L) users; 

(4) All BSs serving an active user transmit the same coded data stream to the user. Each BS transmits 
this user stream on a beam that is (independently) designed for this user at that BS. 

(5) Any BS j share its transmit power Pj equally among the users it serves. 

As explained in Sec. Ill, the properties of ALJTS enable us to decouple the problems scheduling and 
load balancing, since the instantaneous active-user rates depend only on the serving BS cluster and are 
predictable a priori based on the given Sj{L) value. 

To illustrate these principles, we give LIT examples that obey these rules in Table II, involving 
clusters of sizes 1 (i.e., cellular transmission) and 2. Four BSs are considered with Pj = l, Sj{l) = 2 and 
Sj{2) = 3. As the table reveals, each BS on RB #1 engages in cellular transmission. On RB #2, pairs of 
BSs perform LIT with each BS pair serving a triplet of users. RBs #3 and #4 provide additional, more 
interesting modes. On RB #3, no user is served by the same cluster. On RB #4, BSs 1 and 2 serve users 
in clusters of size 2, while BSs 3 and 4 serve users in cellular transmission. Note that if orthogonal 
pilots are used, (at least) 8, 6, 6 and 7 uplink pilot dimensions (one dimension per user) are needed to 
enable RBs #1, #2, #3 and #4, respectively. Evidently, the choice of scheduled user sizes Sj{L) signifies 
how aggressively pilot dimensions are reused across the network (e.g., Sj for fully reused pilots and 
LSj for orthogonal pilots). Inspection of Table II reveals that the first ALJTS property can be satisfied 
either if all clusters at an RB are of the same size (RBs #1,2,3) or if different size clusters are disjoint 
across BSs (RB #4). 

Depending on the availability of bands and the preferences of the operator, in 5G HetNets, different 
groups of BSs might be allowed to jointly transmit in clusters across groups of RBs (e.g., across different 
frequency bands). Each such combination of BS clusters is considered separately as a distinct entity in 
the resource allocation problem. Given the set of BSs B in the network, there are — 1 many possible 
different entity/operation options. Although in principle it is straightforward to take into account all 
these different options, for simplicity and considering the general interest in 5G HetNet deployments, 
we focus only on the following options: 1) macro and small cell layers may operate together; 2) only 
the macro layer operates; 3) only the small cell layer operates. We call these 3 different operations as 
shared, macro-only and blanking operations, respectively. Let A G {1,2,3} denote the operation type, 
where A = 1, A = 2 and A = 3 denote shared, macro-only and blanking operations, respectively. Let 
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TABLE II 

Example of RBs enabled by LIT over 4 BSs. 


RB 


BS 1 

BS 2 

BS 3 

BS 4 


Cluster Size 

1 

1 

1 

1 

#1 

User Power 

1/2 

1/2 

1/2 

1/2 


Served Users 

1,2 

3,4 

5,6 

7,8 


Cluster Size 

2 

2 

2 

2 

#2 

User Power 

1/3 

1/3 

1/3 

1/3 


Served Users 

1,2,3 

1,2,3 

4,5,6 

4,5,6 


Cluster Size 

2 

2 

2 

2 

#3 

User Power 

1/3 

1/3 

1/3 

1/3 


Served Users 

1,2,3 

1,4,5 

2,4,6 

3,5,6 


Cluster Size 

2 

2 

1 

1 

#4 

User Power 

1/3 

1/3 

1/2 

1/2 


Served Users 

1,2,3 

1,2,3 

4,5 

6,7 


RBs allocated to operation A form Band-^d, and the set of BSs that can transmit in Band-^d be 
We have the following cases: 

1) ^4 = 1: shared operation is considered for this band, where macro and small cell BSs can both 

transmit, i.e., = B. In this band, clusters can be formed by BSs from different layers. 

2) A = 2: macro-only operation is considered for this band, and only macros can transmit, i.e., B^‘^^ = Bm- 

3) ^4 = 3: blanking is applied to this band, where all of the macro BSs are muted, i.e., B^^^ = Bs.^ 
Different resource allocations among operations refer to different scenarios. For example, scenarios 

with Ag {2,3} correspond to cases where orthogonal RBs are allocated to macro and small cells, while 
scenarios with Ag {1,3} can be applied to cases with elCIC. In some scenarios, resource allocation 
among operations can be fixed a priori. In more flexible scenarios, resource allocation among operations 
can be a part of the optimization problem. Our formulation in Sec. IV can be applied to both cases. 

To show how these different transmission operations of practical interest can be considered within 
our specific LIT, we give a small example in Table III. As different bands may prefer different cluster 
sizes, we consider clusters up to size in Band-A. Then the potential BS clusters that can be 

active in this band is given by the subsets of B^"^^ with size less than or equal to Table III is 

extended from Table II by adding the macro BS (BS #5) and considering other BSs as small cell BSs, 

*In general, blanking can be applied to only a subset of of macro BSs. In fact, [29] has shown gains compared to blanking all macro 
BSs. Our formulation can include partial blanking by considering new operation options. To simplify the exposition, in this work we 
confine ourselves to blanking all of the macro BSs. 

^Many of these clusters are not necessary. Eor example, clusters between BSs that are geographically distant are not necessary to 
consider, as in a practical system no user would be assigned to these clusters. This type of practical observations can eliminate many 
cluster options for all RBs. In the following, to avoid cumbersome notation, while listing potential clusters, we only use the cluster size 
and the active BS set of the corresponding band to describe potential BS clusters. 
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TABLE III 

Example of RBs enabled by ALJTS over 4 small cell BSs (BSs 1-4) and 1 macro BS (BS 5). 


RB 


BS 1 

BS 2 

BS 3 

BS 4 

(Macro) BS 5 


Cluster Size 

1 

1 

1 

1 

- 

a 

User Power 

1/2 

1/2 

1/2 

1/2 

- 


Served Users 

1,2 

3,4 

5,6 

7,8 

- 


Cluster Size 

1 

1 

1 

1 

1 

b 

User Power 

1/2 

1/2 

1/2 

1/2 

1/2 


Served Users 

1,2 

3,4 

5,6 

7,8 

9,10 


Cluster Size 

2 

2 

2 

2 

- 

c 

User Power 

1/3 

1/3 

1/3 

1/3 

- 


Served Users 

1,2,3 

1,4,5 

2,4,6 

3,5,6 

- 


Cluster Size 

2 

2 

2 

2 

2 

d 

User Power 

1/3 

1/3 

1/3 

1/2 

1/3 


Served Users 

1,2,3 

1,4,5 

2,4,6 

3,7 

5,6,7 


Cluster Size 

- 

- 

- 

- 

1 

e 

User Power 

- 

- 

- 

- 

1/2 


Served Users 

- 

- 

- 

- 

1,2 


i.e., Bm = { 5 } and Bg = {1, 2,3,4}. Assume LmL = 2 5 Lmax — 1 and lSL = 2 in this example. RBs b 
and d are in Band-1 (shared operation), RB e is in Band-2 (macro only operation), and RBs a and c 
are in Band-3 (blanking operation). In RB b, each BS is performing cellular transmission, while RB d 
considers clusters of size 2 including both macro and small cell BSs. In RB e, the macro BS (BS #5) 
serves users via cellular transmission. In RBs a and c, only small cells serve users while the macro BS 
is muted. In fact, clusters in RBs a and c are the same as in RBs #1 and #3 in Table II, respectively. 

Remark 1. For any band, potential BS clusters are given by subsets (of appropriate size) of the active 
BS set on that band. As noted earlier, some subsets can be eliminated given the topology and constraints 
in the network. At the scheduling slot scale, the active BS clusters are determined by the scheduler that is 
operated by the central controller. Fig. 2 provides a flow diagram for the interactions between different 
entities, such as BSs, users and central controller and includes a load balancing and a scheduling unit. 
These interactions between different entities and the processing performed within different entities are 
discussed in the Sections 111, VI and VII. 

III. Instantaneous and Long-term Rates 

In this section, we provide proxy expressions for instantaneous rates and long-term rates (throughput) 
with either LZFBF or Maximum Ratio Transmission (MRT, also known as Conjugate Beamforming). 
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Fig. 2. Given the resource allocation done by the load balancer at the central controller, the scheduling is done at the scheduler unit in 
the central controller as shown in Fig. 2. 

We denote the transpose and conjugate transpose of matrices by (•)^ and (•)^, respectively. 

On a generic RB t, we let gkj,i = ^J~^jhkj^i be the channel between the transmit antenna of 
BS j and any user k, and it includes both slow fading f5kj and fast fading hkj^i. The slow fading j3kj 
characterizes the combined effect of distance-based path loss and location-based shadowing. Let KLj{t) 
be the set of users served by BS j and Sj{t) = \]Cj{t) \ denote the number of users scheduled by BS j on 
RB t. The channel matrix between BS j and its active users (the users scheduled by this BS at this RB) 
is denoted by Gj, where the dimension of Gj is MjxSj{t). The column of G^ corresponds to the 
channel of user k = kj{m) for some k e {1,2,...,/C}. That is, for a given m we have [Gj\im = gkj,i, 
with k = kj{m). This expression can also be interpreted in terms of the inverse mapping m = mj{k), 
that is, for a given m we have [Gj\im = gkj,i, with m = rnj{k). 

We assume each link experiences independent Rayleigh fading, i.e., h.kj = [hkj,i,- ■ ■ ,hkj^Mj]'^ are 
complex Gaussian i.i.d. random variables.We let Fy denote the precoding matrix at BS j with 
dimension Mj x Sj{t), whose column frnj is the beam (i.e., the precoding vector) for the rri^^ 
user of BS j, i.e., the user whose channel to the BS is given by the m-th column of Gj. The signal 
symbol of user k is denoted by Sk, where Sk has unit energy. The thermal noise at user k is denoted 
by Wk, which is assumed to be additive white Gaussian noise (AWGN) with variance cr^. 

We consider a scheduling policy on RBs {1, 2 • • • , T} and assume that all the large-scale coefficients 
stay fixed within this period. Any such scheduling policy can be described in terms of the scheduling 
sets {Wc(f); VC,Vf e {1, 2 • • • ,r}}, where Uc{t) denotes the set of users served by cluster C on RB t 
and VC denotes all of the possible cluster options considered for all bands taking into account active 

'°The instantaneous user-rate expressions in this paper assume no spatial correlation in the user channels. In principle, instantaneous 
user-rate expressions can be developed for spatially correlated user channels with a given spatial correlation structure by using the method 
of deterministic equivalent [5]. The framework presented in Secs. IV-VI, however, is not directly applicable with spatially correlated 
user channels, since a user’s instantaneous rate would in general depend on the spatial correlation of the other scheduled user channels. 
Although beyond the scope of this paper, extensions involving spatially correlated user channels is an area worth further investigation. 
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BS sets, cluster sizes and possible cluster selections/eliminations. Without loss of generality, we assume 
that RB t is in Band-^d and Thus, the received signal at user k G Uc{t) on RB t is 


yk{t) =Y^ 


p. 




Si(|C|) 


gfcj (t) ^rrij {k)j {t)sk + E E 

j^C (t), u^k 


p. 

^ ( 1 ^ 1 ) ^mj{u)j{t)Si 


desired 


intra-cluster interference 


Sfcz(^) 

pc (f) V 

noiQP 


( 1 ) 


inter-cluster interference 


Adopting LZFBF, the precoding matrix at BS j is = Gj (GfGj) ^ where is the 
normalizing coefficients matrix. Specifically, is a Sj{L) xSj{L) diagonal matrix with the kth diagonal 


element being afc,fc = 


[(<=? 

-1 


, where [(G^Gj) denotes the kth row and kth column element 


k,k 


of the matrix (Gj^Gj) . In this case, the intra-cluster interference is 0. With MRT, the precoding matrix 


at BS j is Fj with the mth column being imj = 
on the operation band. 


Skj{m)j 

\Skj {m)j I 


Note that the set of interfering BSs depends 


A. Instantaneous Rate 

In this paper, we assume that each BS has available perfect CSI regarding the user terminals it 
serves.Let Sj denote the number of users served by BS j in cellular transmission, with S, « Mj. 
Under mild assumptions on fading, the user instantaneous rates on RB t, rkj{t), can be predicted a 
priori in the massive MIMO regime [151. In particular, there exist deterministic quantities {rkj} such 
that rkj{t) Vkj, 'ik and Vj G B, as Mj,Sj —> oo, with fixed Vj = Sj/Mj > 0 [4,5,421. This 
convergence is very fast with respect to Mj’s. Unlike general CoMP, where a user’s instantaneous rate 
depends on the other users co-scheduled on the same RB [451, the ALJTS makes a user’s instantaneous 
rate independent of the other users in the scheduling set. Let (t) be the instantaneous rate of user k 
from cluster C on RB t in Band-A. There exist deterministic quantities such that r]^{t) 

as Mj,Sj{\C\)^oo with = 5'j(|C|)/Mj >0, Vj G C. 

Using the techniques in [45,471, we can show that the approximate instantaneous rate of user k from 

"Massive MIMO rate calculations for general CoMP setting with practical TDD UL training for CSI acquisition can be found in [45]. 
The instantaneous rate expression in this manuscript can be updated accordingly taking into account the pilot contamination term and the 
training overhead as a special case of [45]. This was done for the cellular case in [15]. 
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cluster C C in Band-74 using LZFBF is 

{A) _ n A I YhleC ^P,PiPk,PkW\M\c\) 

'^kC ^^§2 I ' ^2 I p R 

\ (J + ^iPkl 


where hj{a) = 


M,-S,{a)+l 


Similarly, the approximate instantaneous rate of user k from cluster C in 


Band-74 using MRT is 


,(^) _ 


l0g2 1 


Sjec lec \l 
+ ikc + 


! PjPiMjMipkjPki 

^.■(|C|)^K|C|) 


where Ikc = is the non-zero intra-cluster interference. Clearly, = 0 if 

Eqs. (2) and (3) assume that Vj gC, BS j serves -S'jdCI) users and allocates Pj/Sj{\C\) fraction of 
its power to each user. In the case that fewer users are served by one of the BSs, (2) and (3) represent 


achievable lower-bound instantaneous rates. 


It is worth noting that LIT provides instantaneous-rate improvements at the cell-edge with respect 
to cellular, as it replaces the intra-cell BF gain provided by the cellular schemes with intra-cluster BF 
gain. Also, given that macro BSs do not transmit in Band-3 (i.e., the blanking operation), users in small 
cells benefit from larger SINRs in Band-3, as there is no interference from macro BSs to small cell 
users in this band. 


B. Long-term Rates 

As discussed in [151, users can be served (at distinct scheduling instances) by more than one BS in 
massive MIMO networks in cellular transmission. Similarly, users can be served by different clusters 
on different RBs in ALJTS. Let = limT^oo (^)> * ^11 (denote the activity fraction 

of user k to cluster C in Band-A, that is, the fraction of resources allocated by cluster C to user k in 
Band-A. The activity fraction is a real number showing the fraction of RBs (averaged over many 
slots within which the load balancing is considered) where user-/c is served by cluster-C in Band-A. If 
>0, user k is served by cluster C in Band-A. We obtain the long-term rate similar to [151, using 
instantaneous rates and activity fractions from the scheduling policy. In particular, in the limit T —> oo, 
the long-term rate of user k equals 

= (4) 

^=1 C:CCB^^'> 

'^Convergence to the limiting expressions of interest is very quick [15]. 
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IV. Unified NUM Problem Formulation 

While ALJTS allows for varying cluster sizes within an RB as revealed in Table II, in the sequel 
we specialize to the tractable case of practical interest involving equal-size clusters on each RB. 
Consequently, the following framework allows for cluster options as seen in RBs #1-3 but not for 
RB #4 in Table II. We call this new, additionally constrained, scheme the Uniform Cluster-Size scheme 

(UCS).13 


Definition 2. Uniform Cluster-Size Scheme (UCS).- ALJTS is a UCS if 

(1) jj.A fraction of RBs is allocated to Band-A, with '^aTa < 1.’ 

(2) For each Band-A, Xal fraction of RBs is allocated to size-L clusters for 1 < L < 'with 



(3) on any RB in the Xal fraction, the scheduled users are served by (user-dependent) clusters of the 
same size L and these clusters are formed by BSs in 

(4) on any RB in the Xal fraction, each BS does not serve more than Sj{L) users. 


Then LIT designs considered in the rest of the paper are all Uniform Cluster-Size Schemes. RBs 
allocated to serving size-L clusters in Band-A comprise what we call the LX^ subband of Band-A. The 

NUM problem for the UCS optimizes activity fractions and subband/band allocations is as follows: 

/ \ 


max 




\ {A) - 

,rA 


E E 


frA) (A) 
'^kC ' kC 


C:CCLB^^\ I 

\ \C\<LiXf / 

(^) 

E sfir - ^ 

jec,\c\=L 


E 


4A 

'kC — 


<Xal. ykeUfrL<fr^lfrA, 


C:|C|=L,CCB(^) 

r(A 


Xal E VA, 

L=1 

3 

A=1 

ciffrAL.hA > 0, V# G UfrCfrL < fr±frA, 


(5a) 

(5b) 

(5c) 

(5d) 

(5e) 

(5f) 


'^Unlike UCS, when different size clusters are allowed to operate together, great care in the scheduling design must be given to avoid 
overlapping clusters with different sizes to operate in the same RB. This makes scheduling very complicated. The considered UCS provides 
a lower bound on the performance compared to more general ALJTS, which can serve a useful benchmark. 
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where the utility function U{-) is a continuously differentiable, monotonically increasing, and strictly 
concave function [481. Constraint (5b) signifies that the total activity fractions allocated by BS j in 
clusters of size L in Band-^d cannot exceed the total available resources XALSj{L). On the other hand, 
recalling that each user cannot be served by multiple clusters on the same RBs, (5c) signifies that the 
fraction of RBs over which user k is served by clusters of size L in Band-^d cannot exceed RBs allocated 
to the clusters of size L in Band-^d, Xal- (5d) ensures that the total resources allocated to the subbands 
in Band-^d are no more than the resources allocated to that band. Finally, (5e) signifies the fact that the 
summation of resources allocated to different bands is equal to all available resources. 

Remark 2. The formulation (5) is quite flexible. It can be applied not only to the scenarios that 
optimize the resource allocation among operations, but also to the scenarios where resources given to 
each operation are fixed a priori by setting the corresponding pa values to constants in (5). The cellular 
transmission [15] can be recovered as a special case of (5) by setting ^d = 1 and Lmax(l) = 1- 

Any concave function (e.g., general a-fairness, [491) can be applied to the problem formulation (5). 
The formulated optimization is convex as long as the utility function is concave [501. In this paper, 
among various concave utility functions available in the literature, we work with the logarithmic utility 
which is also known as “proportional fairness” [14,15,331.^"^ General numerical solvers (e.g., CVX) 
can be used. Since CVX is not well-suited for large instances [511, we alternatively propose an efficient 
algorithm that can be applied to large networks in the next section and the complexity difference between 
the proposed algorithm and a general numerical solver is investigated in Appendix C. 

V. Dual Subgradient Based Algorithm 

In this section, we propose an efficient algorithm based on the dual subgradient method [501. We 
let and be the Lagrange multipliers corresponding to (5b) and (5c), respectively. The dual 

'"^There are many options for the utility function such as (weighted) arithmetic mean, geometric mean, max-min fairness. Each option 
can be relevant for certain scenario and interest. Both arithmetic mean and max-min fairness have their shortcomings [39, Chapter 1]. 
Geometric mean (aka proportional fairness), promotes a trade-off of user-rates between the other two utility functions. The “log” function 
in the utility ensures diminishing returns for individual users rates as they get higher. This de-motivates the optimization to give high 
rates to a few users. 
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problem of (5) is ^ min Y.keu A (^kL ) + S' ( . where 




h = 


4c^>0 


'°g Y. Y 

-4=1 C:Cce("^) 


YA) (A) 
'^kC ' kC 


\C\<l7Y 


3 l7Y ^XA) 

A—/ A—/ A—/ A—/ ^. 7 ]^'\ 

-4=1 L=1 C-.CcB^^\J-j^C ^ 


O T (^) 
*J -^max 


YY 7 ^ Y ^ 

A=1 L=1 C:CCB(^),ICI=L 


3(i'7’^i7= ,^max ^ 

Yt=Y ^AL<Ma, A=1 L=1 \j:jeB(^) 
X^A=1 A'-A^l 


o T (^) 
«-> ^max 


EE E 


Y7X A... 


The constraints of (5) satisfy the Slater condition [501, and thus strong duality holds (i.e., the dual 
problem and the original problem (5) have the same optimal value). 

A. The Dual Subgradient Method 


The optimization problem (6) has the closed-form optimal solution 


(-4) - J El:L=|C| (E,-: 


|C| \A^j-.jec •'jL 


7Sj(i)+»il’) 


if {C,A} = {C\A-}, 


Otherwise, 


where {C*,A*} = argmax^^A 


/ (A), (r.,n(A)\ • 

YL-.L = \C\\Yj:jeC^jL /E(l-)+^feL ) 

The problem (7) is an LP and one optimal solution is^^ 

f 1, if {A,L} = argmax ^ 


\at, — 


r-3^B{A') 


0, otherwise, 


1, if there exists a band A such that the above Aal > 0, 


0, otherwise. 


The fth iteration of the algorithm is as follows. 

1) Update the activity fractions by (8). 

2) Update resource allocation for different bands and clusters by (9) and (10). 


'^If we have multiple pairs of {C*, A*}, we just randomly pick one pair. 

'®If we have multiple {A, L} pairs that maximize the Yj-j£B(A) 44 + Ykeu ^14’ 'A'® randomly pick one. 
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3) Update the Lagrangian multipliers by 




/ 




XAL{n) - 


E 


(^)/ ■ 
keu Ec 


V 


C:CCB{A), 

jec,\c\=L 


S,(L) 


\ 

I 

jj 


( 11 ) 


and 


^ii\n + l) = ei'l\n)-6{n) \AL{n) - ^ 


'kC 


( 12 ) 


C:Cc:B{A),\C\=L 


where [z]"^ = max{2;,0} and 5{n) is the stepsize at the ^th 

iteration. 

(A) 

By adding redundant constraints x\.q < 1 and choosing an appropriate stepsize (e.g, a diminishing 
stepsize 5{n) = where a and h are some positive scalars), the subgradients can be bounded. This 
allows us leveraging Prop. 6.3.4. in [501 to show the convergence of the dual subgradient algorithm. 
The detailed steps for the algorithm with redundant constraints can be found in Appendix D. 

B. Finding the Optimal Primal Solutions Given the Optimal Dual Variables 

Note that the objective function of (5) is not strictly convex and we may have multiple optimal 
solutions. In this case, given the optimal dual variables, it is generally difficult to find the optimal 
primal solutions that satisfy the KKT conditions. However, by exploring the structure of (5) as follows, 
we propose to obtain the optimal primal solutions by solving a small-size LP. 

The optimal long-term rate R], = Ea=i Ec cce(^) (^) unique, since the function log{Rk) 

is strictly concave with respect to Rk. KKT conditions of problem (5) imply 


Rk > 


4A 

kC 


Thus, given the optimal dual variables, the unique optimal rate can be easily obtained by Rl = 


maxc 4 


M) 




|C| 


. We observe from (13) that in the optimal solutions, each user only 
has positive activity fractions to clusters providing the maximum term of the right-hand side of 
(13). Based on this conclusion, we propose the following LP, whose size is reduced by only focusing 

(A) 

on the positive x^.^ obtained from (13). 


max T] 

ri,x,\ 


(A) (A) 


A=lCcB(^'> 

(56) - (5/). 


Rt 


( 14 ) 
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Proposition 1. Given that Rl is the exact optimal rate of (5), the solution of (14) is the same as the 
optimal solution of problem (5). 

Proof: Similar techniques in the proof of Lemma 1 in [15] can be used to complete this proof. ■ 
Prop. 1 implies that we can obtain the solutions of (5) given the optimal dual variables. Though 
we can show the convergence of the dual subgradient algorithm by adding redundant constraints, there 
may exist a small gap between the obtained dual variables and the optimal ones, due to the numerical 
precision or the limit on the number of iterations. Exploiting the well-behaved structure of (14), i.e., 
finite coefficients and a bounded feasible set [15], it is expected that the solution of (14) is near optimal 
in the presence of a small gap between the obtained dual variables and the optimal ones. 

Empirical evidence reveals that in a heavily loaded network, where constraints (5c) are inactive (i.e., 
Y1iC:\c\=l, <)^al), most users are uniquely served by one cluster on each subband. Insight regarding 

this observation can be obtained by examining KKT conditions of (5) as follows. 

Proposition 2. For a given Band-A and a cluster size L, if (5c) are inactive \/k E U, the number of 
users that are served by multiple BS clusters on RBs allocated to 15^ subband of Band-A is at most 

— 1, where is the number of clusters in the subband of Band-A. 

Proof: See Appendix B. ■ 

Prop. 2 implies that the optimal user associations in each subband are mostly unique. We call the users 
served by more than one cluster on any subband as “fractional users”. Note that Prop. 2 provides an upper 
bound (i.e., Nq^) on the number of fractional users, while simulations show a much smaller number of 
fractional users (less than 3.5%A in Sec. VII). Recall that the dual subgradient algorithm determines 
the set of positive of users to their cluster-band pairs {C*, A*}, while the rest of activity fractions 
are zero. Thus, unknown activity fractions that needs to be solved via (14) are only the positive activity 
fractions. Based on Prop. 2, most users (with unique association) have at most one positive activity 
fraction on any subband. Thus, the size of (14) is significantly reduced, implying the efficiency of the 
proposed algorithm. Eurther details on the algorithm complexity can be found in Appendix C. 

In summary. Proposition 1 has revealed the optimality of the specific method proposed in Sec. V-B 
to obtain the primal variables given the dual variables. Eurthermore, the analysis of the number of 
iterations required for convergence of the proposed algorithm and its complexity reveal the efficiency 
of the proposed algorithm with respect to its application to large network instances. Unlike the cellular 
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case [15], it is not a priori known whether the NUM solution can be implemented via any scheduler 
or not. The implementation of NUM solutions is discussed below. 

VI. Scheduling 

In this section, we develop scheduling policies that yield activity fractions closely matching the NUM 
solution. Scheduling is done independently and in parallel for each band. As seen in Fig. 2, a scheduler at 
a central controller can collect the needed scheduling information and schedule the users according to the 
proposed scheduling scheme independently for each band (i.e., for each operation option). Considering 
a scheduling policy for Band-A and letting L{t) be the cluster size in RB t, we define the feasible 
scheduling policy as follows. 

Definition 3. Feasible Schedule: A scheduling policy , VC C \C\ < L^x, Vf in Band-A^ 

is feasible with respect to the UCS based on Defn. 2, if it satisfies the following: 

(i) For each t, the policy assigns RB t to clusters with C C and \C \ = L{t) in Band-A; that is, 
for each cluster C with Uc{t) being non-empty, we have CdB^^^ and \C\ = L{t). 

(ii) For each t, each user is served by at most one cluster; that is, \ 'ffccB d < 1. 

(Hi) For each t in Band-A and for each BS j G B^"^\ BS j serves at most Sj{L{f)) users; that is, 

\k>C:jec,ccBd)b{c{t) \ < Sj{L{t)). 

A. The Feasibility of the NUM Solution in Implementation 

It is easy to verify that yielded by any feasible schedules defined by Defn. 3 satisfy (5b)- 

(5f). In fact, when L^x = 1 (i.e., cellular cases), there exists at least one feasible schedule that can 
provide long-term activity fractions approaching the solution of (5) [15]. However in the general case 
Cmax > 1 this is not necessarily true. For instance, for networks with cluster combinations 

and {^ 2 , 23 }, where ji,j 2 and js are BS indexes, there exist satisfying (5b)-(5f), for 

which no feasible schedule of Defn. 3 exists. 

Theorem 1. In the UCSs with Lmix > 1 in some Band-A and with the type of cluster combinations 
{jijj 2 }, {jijjs} ^nd {^ 2 , 23 }. where ji,j 2 cind js are BSs in B^^\ there exist some activity fractions 
satisfying (5b)-(5f) that cannot be implemented by any feasible schedule in Defn. 3. 


Proof: See Appendix A. 
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Hence, the coarser time-scale NUM problem (5) does not capture the finer time-scale constraints 
associated with feasible schedulers. Although, in general, (5) provides an upper bound on the network 
performance, as we show next, using activity fractions that are the solution to (5), we can design 
scheduling policies, whose performance is close to the utility provided by the solution to (5). 


B. Virtual Queue Based Scheduling Scheme 

We next present scheduling policies for the UCS architecture comprised of Yl\=i ^max(^) parallel 
schedulers, one per each subband. We describe a method for scheduling users over the RBs from the 
Aal > 0 fraction of RBs dedicated to clusters of size L in band-A. 

Given the limited number of fractional users per cluster size L, the scheduler approximates the optimal 
by unique association activity fractions, given by 


AA) _ 
'^kC ~ 



if C = C*{k) 
otherwise 


(15) 


with C*{k) = argmax^o. |c|=l,ccb(^) Letting denote the users for which > 0, we have 
n = 0 for all C^C with |C| = |C'|. We also let = Uc: |c|=l denote the set of users 
that receive non-zero activity fractions from clusters of size L in Band-A. In the rest of this section, 
we focus on clusters C satisifying \C\ = L and unless otherwise specified. 

To assign user k a fraction of RBs close to the desired fraction in the subband of Band-A, 
i.e., ak = x)^*^^^/\ al, we consider a max-min scheduling policy based on virtual queues (VQ), which 
assumes user k receives rate Rk = f/ak when user k is scheduled for transmission by cluster C*{k) 
(i.e., k E (t)). The cluster-size L scheduler performs at each t a weighted sum rate maximization 
(WSRM) of the form [521: 


max y^Qk{t)Rkj 

UCUiAL) ^ 

keu 

(16a) 

s.t. eCik)} < S,(L), \/jeB, 

k&A 

(16b) 


where the weight of user k at time t, Qk{t), is the VQ length of user k at time t. For max-min fairness 
[521, Qkit) is updated by Qkit + 1) = max{0, Qkit) - Rkit)} + Afc(f), where 


Rk{t) 


! Rk if user k is scheduled at time t 
0 otherwise 


(16c) 
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Ak{t) 


A 


max 


ifv>EkQi=(t) 

J 


(16d) 


0 otherwise 

with ^max and V chosen sufficiently large [521. Note that in the absence of constraints (16b), the 
max-min scheduler (16) schedules user k the desired fraction of RBs, ak. 

Scheduling via (16) is impractical, as it amounts to solving for each RB t an integer linear pro¬ 
gram (16). A number of heuristic algorithms can be used to provide feasible (though generally subopti- 
mal) solutions to (16). In this paper, we consider a rudimentary greedy algorithm. Letting Kal = \ 
be the total number of users to be served by clusters of size L, the greedy algorithm for size-L clusters 
at time t operates as follows: 

1. Determine a user order 7r(/c), where Q-K{k){t)Rn{k) > Q-K{k+\){t)Rn{k+i) for all k e 

2. Initialization: k = 1, and W = 0. 

3. If the user set U U {7r(/c)} satisfies all the constraints in (16b), set W W U {7r(/c)}. 

4. If k < Kal, SQt k = k + 1 and go to step 3. 

5. Output U as the scheduling user set for size-L clusters in Band-A at time t. 


VII. Performance Evaluation 

In this section, we present a simulation-based evaluation based on the “wrap-around” layout in Fig. 3. 
We also present the simulation results with the network deployment including more hexagonal modeled 
macrocells and non-uniformly distributed users (based on 3GPP layout in TR 36.872).^^ 

The parameters used for the layout in Fig. 3 are given as follows unless otherwise specified. There 
are 4 macros with Mj = 100 and LjdCI) = max{10p|(7|, 10}, and 32 small cell BSs with Mj = 40 
and Sj{\C\) =max{4p|(7|,4}, where p is a tunable parameter in [0,1]. There is 1 small cell BS at the 
center of each white square, while 3 small cell BSs being dropped uniformly within each shaded square 
(hotspot). Also, 15 and 90 single-antenna users are dropped uniformly in each white and shaded square, 
respectively. The macro and small cell BS transmit powers are 46dBm and 35dBm, respectively. The 
path-loss for macro-user links and small cell BS-user links are 128.1+37.6 log^Q d and 140.7+36.7 log^Q d, 
respectively, with the distance d in km. The noise power spectral density is —174 dBm/Hz. 

We consider three distinct macro-small cell resource sharing scenarios: (i) the shared scenario with 
macros and small cell BSs transmitting on the same RBs - operations with A = l; (ii) the orthogonal 

'^We assume full-buffer traffic model, while the study of more general traffic models are left for future work. 
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scenario with macros and small cell BSs transmitting on different bands - operations with {2,3}, 
where we provide macros (Band-2) 20% RBs as an illustrative example; (iii) RB blanking with macros 
muted on certain RBs - operations with A G {1,3}. Note that, although we can jointly optimize the 
resource partition (i.e., fiA ) among different bands and user activity fractions (i.e., ) using (5) in the 

orthogonal scenario, we fix jja due to the following reasons. The resource partition among macro and 
small cells is most likely static (or semi-static) in practice. Moreover, the macro and small cells may 
operate on different frequency bands (e.g., the macro and small cells may transmit on lower-frequency 
bands and higher-frequency bands, respectively), where fiA then depends on the available resources on 
each band and thus is not a variable to optimize. As for the selection of the fixed values for fiA, we set 
/U 2 = 0.2 and = 0.8 as an illustrative example. In our selection, we let to be larger than id 2 since 
the small cells are deployed more densely than the macro BSs. Our simulation results can be easily 
updated with other values. For completeness, we have provided the simulation results in the orthogonal 
scenarios with different ^ at the end of subsection VII-A. 

We make comparisons between the conventional approach (i.e., max-SINR), the approach of [151 
and the proposed UCS of this work. Both max-SINR and [151 are cellular approaches. In fact, the 
formulation of [151 is equivalent to UCS in Scenario (i) with Lniax(l) = 1- 

-Tmax depends on the band. For our srmulatrons, we consrder Tmax G {1,4} and lSL G {1,4}. For 

( 2 ) 

Band-2, only cellular transmission from macro BSs are allowed, and hence LmL = 1- The number of all 
possible clusters of size greater than 4 is too large for any practical purpose. Besides, not all subsets of 
BSs are good candidates for being clusters. We determine the set of potential BSs from the perspective 
of users: we let each user pick the strongest 8 BSs providing the largest signal strength to that user^^, 
and the potential BS clusters that can serve the user only include BSs among these 8 BSs. 

There is a one-to-one mapping between the log utility and the geometric mean of rates as (nti fld' 
= exp '^k=i thus we use geometric mean of rates as the metric for performance evaluation. 

A. Simulation of Layout 1 (Figure 3) 

Fig. 4(a) show the geometric mean of rates in scenarios (i) and (ii). The optimal solution to (5), hereby 
denoted as UCS-NUM, is obtained by CVX. We provide performance comparisons between the CVX 
solution (denoted as the UCS-NUM) of (5) and the solution of the dual subgradient based algorithm. The 

'®We pick the strongest 8 BSs, since the performance of picking the strongest 9 BSs is almost the same as the 8-BS case, while the 
utility of picking the strongest 7 BSs is less than the 8-BS case. 
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Fig. 3. The illustration of network deployment. The white grids are the regular areas, while the shadowed grids are hotspots. 

latter has almost the same performance as the NUM solution, which validates our analysis. We observe 
this also for our later simulations, hence the results of the dual algorithm are skipped in following 
hgures for the sake of clarity. It can be seen that when the solution to (5) is approximated by (15) with 
unique association, the utility loss is insignificant thanks to a very few number of fractional users (as 
shown in Prop. 2). Moreover, the proposed greedy VQ scheduling scheme provides performance close 
to the NUM solution, and in particular within 90% of the utility provided by the NUM solution in both 
scenarios (i) and (ii). Note that in cellular transmission, the NUM solution is feasible via some scheduler, 
and thus VQ based scheduling is unnecessary [15]. We can observe that the UCS significantly improves 
the geometric mean of rates versus the optimal cellular performance and the max-SINR association 
(about 1.6 X in the shared scenario and 1.35 x in the orthogonal scenario versus the optimal cellular 
result). 

In Fig. 4(b), we compare the performance of scenarios (i) and (iii). We can observe that RB blanking 
further improves the network utility. 

Observation of Fig. 5 yields similar conclusions. Indeed Fig. 5 shows the rate cumulative distribution 
function (CDF) with different approaches. We illustrate the results of the shared and RB blanking 
scenarios in the same figure, as RB blanking is essentially motivated from the shared scenario to 
manage the interference from macros to small cell users. The rate of bottom (the 10th percentile) users 
using UCS in scenario (i) is about 2.2x of the optimal cellular solution of [15]. The gain is even iarger 
in scenario (ii). 

The number of users served by different ciusters with UCS is iiiustrated in Fig. 6. In the shared 
scenario with max-SINR association, most users connect to macro BSs, since macro BSs have much 
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(a) Scenario(i) and Scenario(ii) 



scenario(i), scenario(iii), scenario(iii), scenario(i), scenario(i), 
L(1) =4 L*'* =L<^> =4 L<'> =L'3I =1 L<'' =1 


(b) Scenario(i) and Scenario(iii) 


Fig. 4. The geometric mean of rates using different approaches (p = l): (a) The UCS with VQ based scheduling scheme provides a large 
performance gain (about 1.6x and 1.35x in the shared and orthogonal scenarios, respectively) versus the optimal cellular result, (b) RB 
blanking further improves the network performance. 




(a) Scenarios (i) and (iii) (b) Scenario (ii) 

Fig. 5. The long-term rate CDF using different approaches (p = l). The rate of bottom (10th percentile) users using UCS is about 2.2 x 
of the cellular transmission case with optimal user association but without interference management. 

larger transmit power than small cell BSs. By load balancing, many users are offloaded to small BSs 
in the optimal cellular solution. In our proposed framework, all users are served by BS clusters with 
multiple BSs, which implies the potential gain using UCS. In the orthogonal scenario, there is no cross¬ 
tier interference and more users may get larger SINR from small BSs than macro BSs, hence more 
users connect to small cell BSs in the max-SINR association compared to the shared scenario. Due 
to the limited resources (20% RBs) available in macro BSs, more users are offloaded to small BSs 
using the load balancing approach in orthogonal cellular transmission. For scenario (i), the percentage 
of fractional users is about 3.3% using UCS, and 1.2% in the case with optimal cellular. In the RB 
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(a) Scenario (i) (b) Scenario (ii) 

Fig. 6. The number of users served by different clusters (p = 1). Most users have unique association. The “Cluster UEs” refer to the 
users served by clusters of size larger than 1. 



L^J1)=4 L^J')=4 L^ax‘^)=1 

L||^a*CT=4 L™*P)=1 L™x*^)=4 Cj ^)=1 



Normal RBs Blank RBs 


(a) (b) 

Fig. 7. (a) The geometric mean of rates using different approaches versus p. As p decreases, the gain from JT decreases in both shared 

and orthogonal scenarios. (b)The fraction of resources allocated to clusters of different sizes in the RB blanking scenario. As p decreases, 
more resources are allocated to clusters with smaller size. 


blanking scenario, the percentage of fractional users in the case using LIT with RB blanking (scenario 
(iii)) is about 2.5%, while the percentage of fractional users adopting cellular transmission with blanking 
is less than 1%. Thus, we can conclude that the number of fractional users in all cases is very small, 
which validates our analysis. 

In Fig. 7(a), we show the geometric mean of rates versus different p. We observe that the performance 
gain using UCS decreases as p decreases, since the number of users that can be served by clusters 
decreases. This implies that the gain from UCS increases as more UL pilot resources are available in 
the system. With limited UL pilot resources, the gain from UCS would be quite small. 
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Fig. 8. The illustration of geometric mean of rates in Scenario (ii) (the orthogonal scenario) with different fractions of resources allocated 
to macro and small cell layers. 

Fig. 7(b) illustrates the resource allocation for clusters of different sizes versus p in the RB blanking 
scenario. The macro BSs are off for about 65% RBs in cellular transmission. In scenario (hi), as p 
decreases, the clusters serve less users, and more resources are allocated to the clusters of smaller 
sizes. When p = 0.25, all resources are allocated to single-BS clusters in normal RBs, and most of the 
resources are allocated to single-BS clusters in blank RBs. This again suggests that when the available 
pilot resources are strictly constrained, the gain from LIT would be limited. 

Fig. 8 illustrates the simulation results in the orthogonal scenario (Scenario (ii)) with different values 
of /i. As p 2 increases, the utility of max-SINR increases due to the fact that most users are associated to 
macro BSs in max-SINR association. As p 2 increases, more resources are available to macro users and 
thus the utility can be improved. On the other hand, if the available resources are limited for small cells, 
it would result in less motivation for load balancing (i.e. pushing users off from macro to small cells), 
since the users, if offloaded to small cell, still suffer limited resources and thus limited rate. Therefore, 
the gain from load balancing and IT will be limited when small fraction of resources are allocated to 
small cells (i.e. when/i 2 is large), as can be observed from Figure 8. 

B. Simulation of Layout 2 (Figure 9) 

In this subsection, we provide similar simulation results for a network topology complaint with 3GPP 
HetNet scenario [71 as shown in Fig. 9. In particular, we have a cellular layout with 7 macro-cell BSs 
and 3 hotspots per macrocell. Within each hotspot region there are 4 randomly dropped small cell BSs. 
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Fig. 9. The illustration of 3GPP layout. 

120 UEs are uniformly dropped in each hotspot region while 60 more UEs are dropped randomly in 
the whole coverage area of each macro cell. The macro/small cell powers and the pathloss models used 
in this experiment are identical to those used in the previous layout. 

Eig. 10(a) compares the geometric mean of rate with various methods in scenarios (i) and (ii), and 
Eig. 10(b) presents the geometric mean of rate with various methods in scenarios (i) and (iii). Similar 
to the layout illustrated Eig. 3, we also observe a significant gain in geometric mean of rate by using 
LIT in all considered scenarios. Specifically, the UCS with VQ based scheduling scheme provides a 
large performance gain (about 1.35) versus the optimal cellular result, as illustrated in both Eigs. 10(a) 
and 10(b). 

Eigs. 11(a) and 11(b) illustrate the user rate CDEs with different approaches. An increase of 83% can 
be observed for the cell-edge users at the 10th percentile compared to the cellular case with optimal 
load balancing but no interference management. It can also be observed that joint RB blanking and JT 
further improve the network performance. 

VIII. Conclusion 

In this paper, we investigate the joint optimization problem of user association and interference 
management in the massive MIMO HetNets. We consider both LIT and RB blanking approaches for 
interference management. We first provide the instantaneous rate from BS clusters by exploiting massive 
MIMO properties, namely the rate hardening and the independence of peak rate from the user scheduling. 
We then formulate a convex NUM problem to obtain the optimal user-specific BS clusters and the 
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(a) Scenarios (i) and (ii) 


(b) Scenarios (i) and (iii) 


Fig. 10. Geometric mean of rate in 3GPP layout with p = 1. 




Fig. 11. The long-term rate CDF in 3GPP layout with p = 1. 


corresponding resource allocation. The unified formulation can be applied to both LIT and blanking 
approaches, as well as the case where macro and small BSs use orthogonal resources. We further propose 
an efficient dual subgradient based algorithm, which is shown to converge towards the NUM solution. 
We show that the NUM solution with LIT may not be implementable by a feasible scheduler, and thus it 
provides an upper bound on the performance. Showing that most users connect to at most one cluster per 
RB in heavily loaded networks, we propose to approximate the NUM solution to a unique association, 
given which we propose a VQ based scheduling scheme to provide approximate but implementable 
results. Simulations show that the proposed scheduling scheme yields results that closely match the 
NUM solution. Investigations involving more dynamic settings (e.g., users with high mobility) and the 
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impact of different factors (such as imperfect CSI acquisition, number of users simultaneously served 
by BSs in different cluster sizes) on the overall system performance are all subjects of future work. It 
is also of interest to theoretically bound the gap between the NUM solution and the results of proposed 
VQ based scheduling scheme. 


Appendix A 
Proof of Theorem 1 

We adopt similar techniques in the proof of Theorem 1 in [151. Once the statement for one cluster 
size in one band is proven, the conclusion can be easily extended to general ATSs with various cluster 
sizes and multiple bands. Thus, we focus on clusters of size L in Band-A (i.e., subband L in Band-A). 
We ignore the index A for simplicity. All the clusters considered below satisfy and \C\ = L. 

The set of feasible scheduling instants is denoted by F, which includes vectors e with element 
Cfcc G {0,1}, where euc = 1 if user k connects to C and Ckc = 0 otherwise. According to Defn. 3, e is 
consisted of {ckc] satisfying that user k connects to at most one cluster and BS j serves at most Sj{L) 
distinct users. By time sharing among the feasible scheduling instants in F, any fractional association in 
the convex hull of F can be achieved in the long term. We denote the convex hull of Fhy X' = conv( 
and the set of activity fractions associated to clusters in A satisfying constraints in (5) by X, i.e., 

X = \xkc - XI XI FJF ^ XI ^ ^ ^ 

[ c-.jeCkeu^X ) Q 

It is easy to show that any feasible scheduling instants in F satisfies the constraints (5b)-(5f), and thus 
F C X. Note that X is convex. Thus, we have X' = conv(F) C X. 

As for the opposite direction (i.e., X % X'), we first define the totally unimodular (TU) matrix: 
every square submatrix of a TU matrix has determinant +1, —1 or 0. The Hoffman & Kruskal’s (1956) 
Theorem claims that a matrix B is TU if and only if for each integral vector b, the extreme points 
of the polyhedron {z : Bz<b,z>0} are integral [531. Denoting by the number of size- 
L clusters in Band-A, we let x = [xf,x^, • • • ,x^]^ with x^ = [xkCi.XkCn ■ ■ ■ Fkc ^nd b = 

^CL 

[5'i(L), • • • , Sj{L), 1, • • • ,1]^ with size (J+X) xl. We let B = [ §], where the size of matrices C and 
D are J X [KN^^) and K x respectively. The element in jth row and (^{k — l)NQ^^+i^th 

column of matrix B is I \/kdU if j dCi, and 0 otherwise. The matrix C has all elements being 1. Recall 
that we consider large networks including the following type of cluster combination: {ji,j 2 }, {jijjs} 
and {j 2 js} if L^mL > 1, where ji,j 2 and js are BS indexes. Then, B with L^x > 1 always includes 
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the submatrix i o i whose determinant is -2, and thus B is not TU. According to the Hoffman & 
Kruskal’s (1956) Theorem, there are some non-integer extreme points v G X that cannot be characterized 
by a convex combination of any elements in T. Thus, we have v^conv(*F) = X' and X 


Appendix B 

Proof of Proposition 2 

We use the techniques similar to the proof of Prop. 3 in [30], where a graph is used to represent the 
association and KKT conditions (13) are used to restrict the graph structure. For a given cluster size 
L in Band-A, we denote the graph by Gi, where nodes represent users, and edge represents the BS 
cluster that serves the two nodes (users). Each node has an ID indicating the user index, while each 
edge has a color that identifies the BS cluster. 

Recalling that constraints (5c) are inactive, we have 6^^ = 0,\/kGU. If there are two users k and m 


being served by size-L clusters Ci and C 2 in Band-A (i.e., x\,q^ > 0, > 0, > 0, > 0), we 

/A) ^(A) ^(A) ^ ^(A) 

have Rk = -- and Rm = -- from KKT 

condition (13), where Rk = Y^A'=\ Z^c'cb(a') Thus, we have 


from KKT 


' fcCi ' mCi 

AX - W’ 
' fcC2 ' mC2 


which is tme with probability 0. Therefore, it is almost sure that any two users can share at most one 
same cluster of size L in Band-A. Similarly, we consider an example of three users /c, m, i and clusters 
Ci,C 2 ,C 3 . User k is associated to Ci and C 2 , user m is associated to Ci and C 3 , and user i is associated 


to C 2 and C 3 . We consider the following three cases: 

1) Clusters Ci, C 2 and C 3 are different: we have A 
which is true with probability 0 . 




2) Cl = C 2 Cs'. users m and i are served by both Ci and C 3 , which is true with probability 0 from (17). 

3) Cl = C 2 = C 3 : users k, m and i are served by the same cluster, which is possible. In this case, the 
graph becomes a complete graph. 

Therefore, the graph Gi with three users either contains a loop with the same color edges or no loop. 
We can get a similar result for graph Gi with more than three users, where any subgraph formed by 
users served by the same BS cluster is a complete graph. Thus, we generate a new graph, G 2 , where 
node represents a cluster. Hence, G 2 has nodes. There is an edge between two nodes in G 2 , if 
they have a common vertex in Gi (i.e., there is at least one user served by both these two clusters). 
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Thus, the number of users who are served by more than one cluster is limited by the edge of G 2 . Any 
loop in G 2 corresponds to a loop with more than one edge color in Gi . Since there are no such colorful 
loops in Gi, there is no loop in G 2 . In other words, G 2 is a tree. Thus, the maximal number of edges in 
G 2 (i.e., the maximal number of fractional users) is one less than the number of nodes (i.e.. 

Appendix C 

Implementation issues oe the dual-subgradient algorithm 

(A) 

We let Lmax = max^ Lmax, Acm = A and Na be the number of operations. To solve (5) 
directly by CVX [511, we have the problem of size 0{NcmNALm&^K), which is dominated by the size 
of variables . On the other hand, as discussed below, the proposed algorithm has lower complexity. 
Let La be the maximal number of active cluster sizes over all bands (i.e., max^ \ {L : Aal> 0}|). The 
size of the LP (14) is 0(AcmAALamin{A^Cm~l, A} + A^Af^amax{0, A —J+1}), where the first term 
signifies the size of positive for fractional users and the second term signifies the size of positive 
x^^ for users with unique association. It is easy to check that the size of (14) is smaller than the size 
of (5). As shown in Sec. VII, the number of fractional users is very small (less than 3.5%K), and thus 
the size of (14) is much smaller (less than 3.5%) than (5). Moreover, the size of (14) can be further 
reduced when La/Lmax is small (e.g., only 2 active cluster sizes among 4 possible sizes in Sec. VII ). 
The fast convergence in the first part (i.e., steps (8)-(ll)) of the algorithm (less than 60 iterations in 
our simulation) along with the low complexity per iteration, and the reduced size of (14) makes that 
the proposed algorithm can be more efficient than CVX for larger networks. 

Appendix D 

Detailed Dual Subgradient Algorithm with Redundanct Contraints 

(A) 

In the formulated problem (5) in our paper, constraints (5c)-(5e) imply Xj^l^ < 1. Thus, adding the 
additional constraint < 1 to (5) will not change the problem. In other words, the following problem 
with constraint < 1 is equivalent to our original optimization problem (5). 
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A=^C:CcB^^\ 
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'^kC ' kC 


\ \C\<L- 


(A) 


) 


v~^ (A) 

E SiL) - ^ 

C:Cce(^), 

jec,\c\=L 

E ^ic’ < Aal, Vfc euyiK l<ai,ma, 

C:|C|=L,Cce(^) 

xi^'e[0,l]. VfceW.VC:|C|<L(l'„V4 

AA) 

-^max 

L=1 

3 

A=1 

> 0, VA: G uycyi < L^^l,yA, 


(18a) 

(18b) 

(18c) 

(18d) 

(18e) 

(18f) 

(18g) 


Note that the above problem formulation is the same as (5) in the paper, except the redundant con¬ 
straint (18d). The dual subgradient algorithm in the paper is proposed based on the above equivalent 
optimization problem (18). 

Specifically, we let and be the Lagrange multipliers corresponding to (18b) and (18c), respec¬ 
tively. The dual problem of (18) is 


mm 


keu 




,(^) niA) 

^kL I ’ 
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= max log 
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Hl=T >^AL<kiA, A=\ L=\ \j:jeB(A) keu 

Ei=l/^A<1 
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.{A) 




(19) 


( 20 ) 


The function (19) is simmilar to (6) in the paper, except that we have an additional contraint < 1. 
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The constraints of (18) satisfy the Slater condition, and thus strong duality holds (i.e., the dual problem 
and the original problem (18) have the same optimal value, which has the same optional solution with 
problem (5) in the paper). 

The optimization problem (19) has the closed-form optimal solution 




Otherwise, 


where [x]J = min{l, max{0, x}}, {C\A*} = argmaxc,^ z;- 

Al-.l=\c\ [Aj-.jec '^jL 

The problem (20) is an LP and one optimal solution is^® 


/So{L)+ei^^) 


Xat, — 


1, if {^,L} = argrnax ^ 

’ r-j^A') 

0, otherwise, 


1, if there exists a band A such that the above Aal > 0, 


0, otherwise. 


The fth iteration of the algorithm is as follows. 

1) Update the activity fractions by (21). 

2) Update resource allocation for different bands and clusters by (22) and (23). 


3) Update the Lagrangian multipliers by 


iyjf(n + 1 ) = lyjfH “ ^alH - 


C:CCB(A), 

jec,lci=L 


S,(L) 


{’J + 1) = (n) - <5{n) ^al (n) - 


'kC 5 


\ C:CCB(A),ICI=L ) 

where [z]“^ = max{z,0} and 6{n) is the stepsize at the ^th 

iteration. 

From the above steps, we can observe that the difference in the dual algorithm by adding the redundant 

(A) 

constraint < 1 is in (21), where the variable needs to be projected to the set [0, 1]. Based 
on this constraint, the subgradients and are bounded, which can be used to show the 

convergence of the dual algorithm. 


'^If we have multiple pairs of {C*, A*}, we can pick the pair with largest r\^\ 

Af we have multiple {A, L} pairs that maximize the we just randomly pick one. 



35 


References 

[1] J. G. Andrews, “Seven ways that HetNets are a cellular paradigm shift,” IEEE Comm. Mag., vol. 51, pp. 136-144, Mar. 2013. 

[2] J. G. Andrews, S. Buzz!, W. Choi, S. V. Hanly, A. Lozano, A. C. K. Soong, and J. C. Zhang, “What will 5G be?,” IEEE Journal 
on Sel. Areas in Communications, vol. 32, pp. 1065-1082, June 2014. 

[3] F. Boccardi, R. W. Heath, A. Lozano, T. L. Marzetta, and R Popovski, “Five disruptive technology directions for 5G,” IEEE Comm. 
Mag., vol. 52, pp. 74-80, Feb. 2014. 

[4] T. L. Marzetta, “Noncooperative cellular wireless with unlimited numbers of base station antennas,” IEEE Trans, on Wireless 
Communications, vol. 9, pp. 3590-3600, Nov. 2010. 

[5] J. Hoydis, S. Ten Brink, M. Debbah, et al, “Massive MIMO in the UL/DL of cellular networks: How many antennas do we need?,” 
IEEE Journal on Sel. Areas in Communications, vol. 31, pp. 160-171, Feb. 2013. 

[6] E. Larsson, O. Edfors, F. Tufvesson, and T. Marzetta, “Massive MIMO for next generation wireless systems,” IEEE Comm. Mag., 
vol. 52, pp. 186-195, Feb. 2014. 

[7] 3GPP, “Technical specification group radio access network; Small cell enhancements for E-UTRA and E-UTRAN,” TR 36.872, 
V12.1.0, Dec. 2013. 

[8] J. G. Andrews, S. Singh, Q. Ye, X. Lin, and H. Dhillon, “An overview of load balancing in HetNets; Old myths and open problems,” 
IEEE Wireless Communications, vol. 21, pp. 18-25, Apr. 2014. 

[9] S. Singh, H. S. Dhillon, and J. G. Andrews, “Offloading in heterogeneous networks: Modeling, analysis, and design insights,” IEEE 
Trans, on Wireless Communications, vol. 12, pp. 2484-2497, May 2013. 

[10] H. S. Jo, Y. J. Sang, P Xia, and J. G. Andrews, “Heterogeneous cellular networks with flexible cell associafion: a comprehensive 
downlink SINR analysis,” IEEE Trans, on Wireless Communications, vol. 11, pp. 3484-3495, Oct. 2012. 

[11] E. Aryafar, A. Keshavarz-Haddad, M. Wang, and M. Chiang, “RAT selection games in HetNets,” in Proc., IEEE INFOCOM, 
pp. 998-1006, Apr. 2013. 

[12] A. Damnjanovic, J. Montojo, Y. Wei, T. Ji, T. Luo, M. Vajapeyam, T. Yoo, O. Song, and D. Malladi, “A survey on 3GPP heterogeneous 
networks,” IEEE Wireless Communications Magazine, vol. 18, pp. 10-21, June 2011. 

[13] A. Ghosh, N. Mangalvedhe, R. Ratasuk, B. Mondal, M. Cudak, E. Visotsky, T. A. Thomas, J. G. Andrews, et al, “Heterogeneous 
cellular networks: From theory to practice,” IEEE Comm. Mag., vol. 50, pp. 54-64, June 2012. 

[14] Q. Ye, B. Rong, Y. Chen, M. Al-Shalash, C. Caramanis, and J. Andrews, “User association for load balancing in heterogeneous 
cellular networks,” IEEE Trans, on Wireless Communications, vol. 12, pp. 2706-2716, June 2013. 

[15] D. Bethanabhotla, O. Y. Bursalioglu, H. C. Papadopoulos, and G. Caire, “Optimal user-cell association for massive MIMO wireless 
networks,” IEEE Trans. Wireless Comm., vol. PP, pp. 1-1, Nov. 2015. 

[16] D. Gesbert, S. Hanly, H. Huang, S. Shamai Shitz, O. Simeone, and W. Yu, “Multi-cell MIMO cooperative networks: A new look 
at interference,” IEEE Journal on Sel. Areas in Communications, vol. 28, pp. 1380-1408, Dec. 2010. 

[17] M. Sawahashi, Y. Kishiyama, A. Morimoto, D. Nishikawa, and M. Tanno, “Coordinated multipoint transmission/reception techniques 
for LTE-Advanced [coordinated and distributed MIMO],” IEEE Wireless Communications, vol. 17, pp. 26-34, June 2010. 

[18] D. Lee, H. Seo, B. Clerckx, E. Hardouin, D. Mazzarese, S. Nagata, and K. Sayana, “Coordinated multipoint transmission and 
reception in LTE-Advanced; deployment scenarios and operational challenges,” IEEE Comm. Mag., vol. 50, pp. 148-155, Feb. 2012. 

[19] P Marsch and G. Fettweis, “Static clustering for cooperative multi-point (CoMP) in mobile communications,” in Proc., IEEE Inti. 
Conf. on Communications, pp. 1-6, June 2011. 



36 


[20] J. Li, T. Svensson, C. Botella, T. Eriksson, X. Xu, and X. Chen, “Joint scheduling and power control in coordinated multi-point 
clusters,” in Proc., IEEE Veh. Technology Conf., pp. 1-5, Sep. 2011. 

[21] J. Zhao, T. Q. S. Quek, and Z. Lei, “Coordinated multipoint transmission with limited backhaul data transfer,” IEEE Trans, on 
Wireless Communications, vol. 12, pp. 2762-2775, June 2013. 

[22] Y. Du and G. de Veciana, “Wireless networks without edges: Dynamic radio resource clustering and user scheduling,” in Proc., 
IEEE INFOCOM, pp. 1321-1329, Apr. 2014. 

[23] E. Bjdmson, M. Kountouris, and M. Debbah, “Massive MIMO and small cells: Improving energy efficiency by optimal soft-cell 
coordination,” in IEEE International Conference on Telecommuniations, May 2013. 

[24] M. Hong, R. Sun, H. Baligh, and Z. Q. Luo, “Joint base station clustering and beamformer design for partial coordinated transmission 
in heterogeneous networks,” IEEE Journal on Sel. Areas in Communications”, vol. 31, pp. 226-240, Eeb. 2013. 

[25] M. Panjabi, M. Razaviyayn, and Z. Q. Luo, “Optimal joint base station assignment and beamforming for heterogeneous networks,” 
IEEE Trans, on Signal Processing, vol. 62, pp. 1950-1961, Apr. 2014. 

[26] S. Wagner, R. Couplet, M. Debbah, and D. T. M. Slock, “Joint precoding and load balancing optimization for energy-efficient 
heterogeneous networks,” IEEE Trans, on Wireless Comm., vol. 14, pp. 5810-5822, Oct. 2015. 

[27] D. Lopez-Perez et al, “Enhanced intercell interference coordination challenges in heterogeneous networks,” IEEE Wireless 
Communications, vol. 18, pp. 22-30, June 2011. 

[28] S. Vasudevan, R. Pupala, and K. Sivanesan, “Dynamic elCIC - a proactive strategy for improving spectral efficiencies of heterogeneous 
LTE cellular networks by leveraging user mobility and traffic dynamics,” IEEE Trans, on Wireless Communications, vol. 12, pp. 4956- 
4969, Oct. 2013. 

[29] A. Liu, V. K. N. Lau, L. Ruan, J. Chen, and D. Xiao, “Hierarchical radio resource optimization for heterogeneous networks with 
enhanced inter-cell interference coordination (elCIC),” IEEE Trans, on Signal Processing, vol. 62, pp. 1684-1693, Apr. 2014. 

[30] Q. Ye, M. Al-Shalash, C. Caramanis, and J. G. Andrews, “On/off macrocells and load balancing in heterogeneous cellular networks,” 
in Proc., IEEE Globe com, pp. 3814-3819, Dec. 2013. 

[31] A. Bedekar and R. Agrawal, “Optimal muting and load balancing for elCIC,” in Inti. Symposium on Modeling and Optimization in 
Mobile, Ad Hoc and Wireless Networks (WiOpt), pp. 280-287, May 2013. 

[32] J. Ghimire and C. Rosenberg, “Resource allocation, transmission coordination and user association in heterogeneous networks: A 
flow-based unified approach,” IEEE Trans, on Wireless Communications, vol. 12, pp. 1340-1351, Mar. 2013. 

[33] S. Deb, P. Monogioudis, J. Miernik, and J. P. Seymour, “Algorithms for enhanced inter-cell interference coordination (elCIC) in 
LTE hetnets,” lEEE/ACM Trans, on Networking, vol. 22, pp. 137-150, Eeb. 2014. 

[34] S. Singh and J. G. Andrews, “Joint resource partitioning and offloading in heterogeneous cellular networks,” IEEE Trans, on Wireless 
Communications, vol. 13, pp. 888-901, Eeb. 2014. 

[35] E. Bjornson, R. Zakhour, D. Gesbert, and B. Ottersten, “Cooperative multicell precoding: Rate region characterization and distributed 
strategies with instantaneous and statistical CSI,” IEEE Trans, on Signal Processing, vol. 58, pp. 4298^310, Aug. 2010. 

[36] E. Hossain, M. Rasti, H. Tabassum, and A. Abdelnasser, “Evolution toward 5G multi-tier cellular wireless networks: An interference 
management perspective,” IEEE Wireless Communications, vol. 21, pp. 118-127, June 2014. 

[37] S. Shakkottai, T. S. Rappaport, and P. C. Karlsson, “Cross-layer design for wireless networks,” IEEE Comm. Mag., vol. 41, pp. 74-80, 
Oct. 2003. 

[38] X. Lin, N. Shroff, and R. Srikant, “A tutorial on cross-layer optimization in wireless networks,” IEEE Journal on Sel. Areas in 
Communications, vol. 24, pp. 1452-1463, Aug. 2006. 

[39] E. Bjornson and E. Jorswieck, Optimal resource allocation in coordinated multi-cell systems, vol. 9 (2-3). Now Publishers, 2013. 



37 


[40] E. Bjornson, N. Jalde, M. Bengtsson, and B. Ottersten, “Optimality properties, distributed strategies, and measurement-based 
evaluation of coordinated multicell OFDMA transmission,” IEEE Transactions on Signal Processing, vol. 59, pp. 6086-6101, Dec. 
2011 . 

[41] Q. Ye, O. Y. Bursalioglu, and H. Papadopoulos, “Harmonized cellular and distributed massive MIMO: Load balancing and 
scheduling,” in Proc., IEEE Globecom, Dec. 2015. 

[42] H. Huh, G. Caire, H. C. Papadopoulos, and S. A. Ramprashad, “Achieving large spectral efficiency with TDD and not-so-many 
base-station antennas,” in IEEE APWC, pp. 1346-1349, Sep. 2011. 

[43] G. Caire, N. Jindal, M. Kobayashi, and N. Ravindran, “Multiuser MIMO achievable rates with downlink training and channel state 
feedback,” IEEE Trans, on Info. Theory, vol. 56, pp. 2845-2866, June 2010. 

[44] J. Zhang, R. Chen, J. G. Andrews, A. Ghosh, and R. W. Heath, “Networked MIMO with clustered linear precoding,” IEEE Trans, 
on Wireless Communications, vol. 8, pp. 1910-1921, Apr. 2009. 

[45] H. Huh, A. M. Tulino, and G. Caire, “Network MIMO with linear zero-forcing beamforming: Large system analysis, impact of 
channel estimation, and reduced-complexity scheduling,” IEEE Trans, on Info. Theory, vol. 58, pp. 2911-2934, May 2012. 

[46] M. Chiang, P. Hande, T. Lan, and C. W. Tan, “Power control in wireless cellular networks,” Foundations and Trends® in Networking, 
vol. 2, pp. 381-533, Apr. 2008. 

[47] Y.-G. Lim, C.-B. Chae, and G. Caire, “Performance analysis of massive mimo for cell-boundary users,” IEEE Trans, on Wireless 
Comm., vol. 14, pp. 6827-6842, Dec. 2015. 

[48] S. Stanczak, M. Wiczanowski, and H. Boche, Fundamentals of Resource Allocation in Wireless Networks: Theory and Algorithms, 
vol. 3. Springer Verlag, 2009. 

[49] J. Mo and J. Walrand, “Fair end-to-end window-based congestion control,” IEEE /ACM Transactions on Networking, vol. 8, pp. 556- 
567, Oct. 2000. 

[50] D. P. Bertsekas, Convex Optimization Theory. Athena Scientific, 2009. 

[51] M. Grant, S. Boyd, and Y. Ye, “CVX: Matlab software for disciplined convex programming,” 2009. Available: http://cvxr.com/cvx/. 

[52] H. Shirani-Mehr, G. Caire, and M. J. Neely, “MIMO downlink scheduling with non-perfect channel state knowledge,” IEEE Trans, 
on Communications, vol. 58, pp. 2055-2066, July 2010. 

[53] A. Schrijver, Theory of Linear and Integer Programming. John Wiley & Sons, 1998. 



